The Exponential Distribution

This document explores both the theory/maths behind the exponential distributions and how those formulas can be represented in code.

Note: These code snippets are meant for demonstration of the concepts. See established modules such as scipy for more robust and widely used implementations.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

The Maths

The exponential distribution is defined by its cumulative distribution function (cdf or F(x)) and associated probability density function (pdf).

Cumulative Distribution Function

The cumulative distribution can be interepreted as: "What is the chance that a randomly selected value is less than X?". Since the probability always improves or remains constant with higher values of x, the cdf is a monotonic function.

There is only one parameter, $\lambda$, used to differentiate between exponential distributions which governs curvature of the cdf.

$$ \text{F}(x) = 1 - e^{-\lambda x} $$

In [2]:
def cdf(lam, x):
    return 1 - np.exp(-lam * x)

def plot_cdf(lam, y_scale=1):
    x = np.linspace(0,1,100)
    y = [cdf(lam, xi) * y_scale for xi in x]
    plt.plot(x, y, label="$\lambda=%g$" % lam, lw=2)

[plot_cdf(lam) for lam in [8.6, 3.6, 1.6, 0.1]]
plt.xlabel("x")
plt.ylabel("$Quantile\ F(x)$")
plt.legend(loc='lower right')
plt.grid()


Probability Density Function

The probability density function is analogous to a histogram of the data and can be calculated from the rate of change of the cdf. $$ \text{pdf}(x) = \frac{d}{dx} F(x) = \lambda e^{-\lambda x} $$


In [3]:
def pdf(lam, x):
    return lam * np.exp(-lam * x)

def plot_pdf(lam):
    x = np.linspace(0,1,100)
    y = [pdf(lam, xi) for xi in x]
    plt.plot(x, y, label="$\lambda=%g$" % lam, lw=2)

[plot_pdf(lam) for lam in [8.6, 3.6, 1.6, 0.1]]
plt.xlabel("x")
plt.ylabel("$pdf(x)$")
plt.legend()
plt.grid()


Generating Exponentially Distributed Data

Random numbers can be generated from a arbitrary distribution with a combination of a uniform random number generator (RNG) and the cumulative distribution function $F(x)$. The uniform RNG, can generate a value on the range $[0, 1)$ which can be translated into the desired quantile of the distribution. For example, if the RNG generates a 0.5, this corresponds to the 50th percentile, the median, of the distribution. Converting the 0.5 to the distribution's median value is accomplished with calculating the inverse of the cumulative distribution function. $$ F(x) = 1 - e^{-\lambda x} \\ Quantile = 1 - e^{-\lambda x} \\ e^{-\lambda x} = 1 - Quantile \\ -\lambda x = \ln{ \left( 1 - Quantile \right) } \\ x = \frac{1}{-\lambda}\ln{ \left( 1 - Quantile \right) } \\ $$ Note: A simplification of $$ x = \frac{1}{-\lambda}\ln{ \left(Quantile \right) } $$ can be used in a RNG although it will generate a value connected with the $(100\% - nth)$ quantile.


In [4]:
def inv_cdf(lam, quantile):
    return 1./-lam * np.log(1 - quantile)

Below shows a quantile of 0.5 being selected and then mapped to the appropriate $x$ value in the distribution ($x \approx 0.2$).


In [5]:
lam = 3.6

x = np.linspace(0,1,100)
y = [cdf(lam, xi) for xi in x]
plt.plot(x, y, label="$\lambda=%g$" % lam, lw=2)

def map_to_cdf(yi, alpha=1):
    xi = inv_cdf(lam, yi)
    plt.arrow(0, yi, xi, 0, alpha=alpha, color='red', ls='dashed', length_includes_head=True)
    plt.arrow(xi, yi, 0, -yi, alpha=alpha, color='red', ls='dashed', length_includes_head=True)

map_to_cdf(0.5)
plt.xlabel("x")
plt.ylabel("$pdf(x)$")
plt.legend(loc='lower right')
plt.grid()


Repeating this process below 100 times shows the uniform distribution of quantiles along the y-axis and the generated exponential distribution on the x-axis with higer concentrations for the more probable lower values.


In [6]:
x = np.linspace(0,1,100)
y = [cdf(lam, xi) for xi in x]
plt.plot(x, y, label="$\lambda=%g$" % lam, lw=2)

[map_to_cdf(val, 0.2) for val in np.random.rand(150)]

plt.xlabel("x")
plt.ylabel("$pdf(x)$")
plt.grid()


This process of generating 100 random numbers is repeated below and the distribution is compared to the exponential distribution's probability density function.


In [7]:
def gen_exponential(lam, N):
    for _ in range(N):
        yield inv_cdf(lam, np.random.rand())

In [8]:
N=100

x = np.array([val for val in gen_exponential(lam, N)], dtype=np.float)
plt.xlim([0,1])
# To keep the plots consistent, only plot values from 0 to 1
plt.hist(x[x < 1], bins=20, normed=True, alpha=0.4, label='simulated $\lambda$=%g' % lam)
plot_pdf(lam)
plt.legend()
plt.grid()


So far, we've explored the mathematics behind the exponential distribution, converted it into code to create an exponential RNG and verified that the generated distribution appears to match the theoretical distribution. Next we work backwards and take a data set that is believed to be exponentially distributed and determine the most likely value for $\lambda$ that would generate that data.

Fitting Exponentially Distributed Data

Previously we've worked from a known $\lambda$ term to create a distribution. Now given a data set, we'd like to obtain the maximum likelihood estimate $\hat{\lambda}$. statlect.com has a nice derivation of the maximum likelyhood estimate for $\lambda$ which becomes the reciprocal of the sample mean. $$ \hat{\lambda} = \frac{N}{\sum_{i=1}^N x_i} = 1/\bar{X} $$


In [9]:
lambda_est = 1/np.mean(x)
print("estimate for lambda=%g which is %g%% off the true value of %g" % (lambda_est, (lambda_est-lam)/lam*100, lam))


estimate for lambda=3.85448 which is 7.06901% off the true value of 3.6

How Good of A Fit?

Now that we have created the most likely estimate for lambda from the data, the final step is to quantify how well the estimated distribution matches the observed. One such method is the Kolmogorov-Smirnov (KS) Test which compares the expected cumulative distribution function with the observed empirical cumulative distirbution function. Recall from above that the cdf for an exponential distribution is:


In [10]:
plot_cdf(lam)
plt.xlabel("x")
plt.ylabel("$Quantile\ F(x)$")
plt.legend(loc='lower right')
plt.grid()


To illustrate the comparison, we'll change the y-axis from the quantile into the number of samples. So a y-axis value of 60 can be interpreted as:

"For 100 independent and identically distributed (iid) random samples, we would expect about 60 of them to have a value less than 0.25."


In [11]:
plot_cdf(lam, len(x))
plt.xlabel("x")
plt.ylabel("Number of Samples < x")
plt.legend(loc='lower right')
plt.grid()


In order to create the observed emperical cdf function, we sort the data and plot a step-wise running tally as the values increase. So for an arbitrary $x$ we can determine how many values in the data set are less than or equal to $x$.


In [12]:
def plot_ecdf(x, y_scale=1):
    def gen_x(x):
        yield 0
        for xi in sorted(x):
            yield xi
            yield xi
    def gen_y(x):
        yield 0
        yield 0
        for yi in range(1, len(x) + 1):
            yield float(yi) / len(x) * y_scale
            yield float(yi) / len(x) * y_scale
    xs = [pt[0] for pt in zip(gen_x(x), gen_y(x))]
    ys = [pt[1] for pt in zip(gen_x(x), gen_y(x))]
    plt.plot(xs, ys, label="Observed CDF", lw=2)

plot_cdf(lam, len(x))
plot_ecdf(x[x<1], len(x))
plt.xlabel("x")
plt.ylabel("Number of Samples < x")
plt.legend(loc='lower right')
plt.grid()


Convert the y-axis back to quantiles by dividing by the number of samples


In [13]:
plot_cdf(lam)
plot_ecdf(x[x<1])
plt.xlabel("x")
plt.ylabel("$Quantile\ F(x)$")
plt.legend(loc='lower right')
plt.grid()


Now we have a visual comparison between the expected cdf and the emperical cdf. Next is to somehow quantify the differences between expected and observed. There are a number of tests that can be used. Below, we'll consider the Kalmogrov-Smirnov test.

Kolmogorov-Smirnov

The Kolmogorov-Smirnov test quantifies the difference betweed two cdfs by searching for the largest vertical distance between them. The larger the gap, the less likely the two distributions are drawn from the same population. An advantage to this type of test is that it is simple to visualize and calculate. A disadvantage is that it simplifies the entire distribution to a single worst case data point instead of quantifying across the entire distribution.


In [14]:
def ks_exp_test(x):
    sx = np.array(sorted(x), dtype=np.float)
    quantiles = np.array([(q+1.)/len(x) for q in range(0,len(x))], dtype=np.float)
    fx = np.array([cdf(lam, xi) for xi in sorted(x)])
    
    # If the largest residual is above the cdf
    D_left = max(np.abs(quantiles-fx))
    # If the largest residual is below the cdf
    D_right = max(np.abs(quantiles[:-1]-fx[1:]))
    return max(D_left, D_right)

Because of the step-wise nature of the ecdf if the ecdf is above the cdf the largest distance will be found at start of the step. If the ecdf is below the cdf, the largest distance will be found at the end of the step.


In [15]:
D = ks_exp_test(x)
print("There is 1 point in the sample that is %g away from its expected quantile" % D)


There is 1 point in the sample that is 0.0952405 away from its expected quantile

This calculation can also be performed with the scipy package which will return an associated p-value for hypothesis testing


In [16]:
from scipy import stats
D, p_value = stats.kstest(x, 'expon', args=(0, 1./lam))
print(D, p_value)


(0.095240450552106337, 0.3059836431935834)